Comparison of different Collection Fusion Models in Distributed Information Retrieval
نویسنده
چکیده
Distributed information retrieval comes into play when a user wants to get information from di erent sources in parallel. One of the challenges of this topic is the Collection Fusion problem: The distinct result lists of the underlying information retrieval systems (IR) have to be fused to give a global relevance-ranked result list according to the user's information need. In this paper several Collection Fusion models were scrutinized which obey certain restrictions. They should not use more parameters than are given by common collections of distributed digital libraries so that they could be deployed in a real-life distributed retrieval environment. The selected models are evaluated in a test system with an interface to the IR systems of Managing Gigabytes (MG) and Oracle. The quantitative analysis of the models was performed using test queries, test documents and relevance evaluations from TREC. Two test runs were executed, one with all four collections stored in MG, another with three collections in MG and one in Oracle. The results of these test runs are discussed with respect to the test setup.
منابع مشابه
Report on the TREC-5 Experiment: Data Fusion and Collection Fusion
This paper describes and evaluates a retrieval model that considers the problem of data fusion and collection fusion as two faces of the same coin. To establish a clear theoretical foundation for combining various sources of evidence provided either by different search schemes (data fusion) or by distributed information services (collection fusion), we have implemented a retrieval model based o...
متن کاملDistributed IR for Digital Libraries
This paper examines technology developed to support largescale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic r...
متن کاملکاربست مدل بازیابی تخصص برای یافتن نویسندگان خبره
This research applied Expertise Retrieval model for finding expert authors, and used evaluation methods of Information Retrieval systems for measuring the performance of those models. Current research is an experimental one. Besides, a variety of methods including survey method has been used in the research process. Various models were developed for finding expert authors, all built on a known ...
متن کاملCluster-based Model Fusion for Spontaneous Speech Retrieval
In this paper we present a new method for combining the results of different models in order to improve the performance on a difficult task: Information Retrieval from spontaneous speech. Our technique is based on clustering the training topics according to their tf-idf (term frequency-inverse document frequency) properties, and selecting the best models for each cluster. When the system runs o...
متن کاملCollection Profiling for Collection Fusion in Distributed Information Retrieval Systems
Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalizing scores based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database and do not conside...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000